Tailored Feature Extraction for Lexical Disambiguation of English Verbs Based on Corpus Pattern Analysis

نویسندگان

  • Martin Holub
  • Vincent Kríz
  • Silvie Cinková
  • Eckhard Bick
چکیده

We give a report on a detailed study of automatic lexical disambiguation of 30 sample English verbs. We were drawing on a lexicon of English verb patterns based on the Corpus Pattern Analysis (CPA), which is a novel lexicographic method that seeks to cluster verb uses according to the morpho-syntactic, lexical and semantic/pragmatic similarity of their contexts rather than to associate them with abstract semantic definitions. We have trained several statistical classifiers to recognize these patterns, using morpho-syntactic as well as semantic features. In this paper we mainly concentrate on the procedures for feature extraction and feature selection and their evaluation. We show that tailoring the features to the verbs respectively, as they are implicitly contained in the pattern definitions (explicitly described in the lexicon), has the potential to significantly improve the accuracy of supervised statistical classifiers. TITLE AND ABSTRACT IN CZECH Rysy šité na míru anglickým slovesům pro automatickou lexikální disambiguaci pomocí Corpus Pattern Analysis Předkládáme detailní studii automatické lexikální disambiguace na pilotním vzorku ťriceti anglických sloves za použití lexikonu vzorů slovesných užití (patterns), který vychází z Corpus Pattern Analysis (CPA). Tato inovátorská lexikografická metoda namísto na abstraktních definicích jednotlivých významů staví na souhře morfosyntaktické, lexikální a sémantické/pragmatické podobnosti slovesných užití. Natrénovali jsme několik statistických klasifikátorů na rozpoznávání těchto vzorů. Klasifikátory využívají jak morfosyntaktických, tak sémantických rysů. V naší studii se sousťredíme na procedury pro extrakci rysů, jejich výběr a jejich evaluaci. Ukazujeme, že rysy na míru uzpůsobené jednotlivým slovesům, jež jsou implicitně obsaženy v definici každého vzoru v lexikonu, mají potenciál významně zvýšit přesnost statistických klasifikátorů s učitelem.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Graded and Word-Sense-Disambiguation Decisions in Corpus Pattern Analysis: a Pilot Study

We present a pilot analysis of a new linguistic resource, VPS-GradeUp (available at http://hdl.handle.net/11234/1-1585). The resource contains 11,400 graded human decisions on usage patterns of 29 English lexical verbs, randomly selected from the Pattern Dictionary of English Verbs (Hanks, 200

متن کامل

Automated Verb Sense Labelling Based on Linked Lexical Resources

We present a novel approach for creating sense annotated corpora automatically. Our approach employs shallow syntacticosemantic patterns derived from linked lexical resources to automatically identify instances of word senses in text corpora. We evaluate our labelling method intrinsically on SemCor and extrinsically by using automatically labelled corpus text to train a classifier for verb sens...

متن کامل

Concordance-Based Data-Driven Learning Activities and Learning English Phrasal Verbs in EFL Classrooms

In spite of the highly beneficial applications of corpus linguistics in language pedagogy, it has not found its way into mainstream EFL. The major reasons seem to be the teachers’ lack of training and the unavailability of resources, especially computers in language classes. Phrasal verbs have been shown to be a problematic area of learning English as a foreign language due to their semantic op...

متن کامل

Word sense disambiguation with pattern learning and automatic feature selection

This paper presents a novel approach for word sense disambiguation. The underlying algorithm has two main components: (1) pattern learning from available sense-tagged corpora (SemCor), from dictionary definitions (WordNet) and from a generated corpus (GenCor); and (2) instance based learning with automatic feature selection, when training data is available for a particular word. The ideas descr...

متن کامل

VPS-GradeUp: Graded Decisions on Usage Patterns

We present VPS-GradeUp – a set of 11,400 graded human decisions on usage patterns of 29 English lexical verbs from the Pattern Dictionary of English Verbs by Patrick Hanks. The annotation contains, for each verb lemma, a batch of 50 concordances with the given lemma as KWIC, and for each of these concordances we provide a graded human decision on how well the individual PDEV patterns for this p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012